Efficient and exact sampling of simple graphs with given arbitrary degree sequence 
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Uniform sampling from graphical realizations of a given degree sequence is a fundamental com- 
ponent in simulation-based measurements of network observables, with applications ranging from 
epidemics, through social networks to Internet modeling. Existing graph sampling methods are 
either link-swap based (Markov-Chain Monte Carlo algorithms) or stub-matching based (the Con- 
figuration Model). Both types are ill-controlled, with typically unknown mixing times for link-swap 
methods and uncontrolled rejections for the Configuration Model. Here we propose an efficient, 
polynomial time algorithm that generates statistically independent graph samples with a given, ar- 
bitrary, degree sequence. The algorithm provides a weight associated with each sample, allowing 
the observable to be measured either uniformly over the graph ensemble, or, alternatively, with 
a desired distribution. Unlike other algorithms, this method always produces a sample, without 
back-tracking or rejections. Using a central limit theorem-based reasoning, we argue, that for large 
N, and for degree sequences admitting many realizations, the sample weights are expected to have 
a lognormal distribution. As examples, we apply our algorithm to generate networks with degree 
sequences drawn from power-law distributions and from binomial distributions. 



I. INTRODUCTION 

Network representation has become an increasingly 
widespread methodology of analysis to gain insight into 
the behavior of complex systems, ranging from gene reg- 
ulatory networks to human infrastructures such as the 
Internet, power- grids and airline transportation, through 
metabolism, epidemics and social sciences UM- These 
studies are primarily data driven, where connectivity in- 
formation is collected, and the structural properties of 
the resulting graphs are analyzed for modeling purposes. 
However, rather frequently, full connectivity data is un- 
available, and the modeling has to resort to consider- 
ations on the class of graphs that obeys the available 
structural data. A rather typical situation is when the 
only information available about the network is the de- 
gree sequence of its nodes V = {do, d\, . . . , djy-i}. For 
example, in epidemiology studies of sexually transmit- 
ted diseases [5J, anonymous surveys may only collect the 
number of sexual partners of a person in a given period of 
time, not their identity. Epidemiologists are then faced 
with constructing a typical contact graph having the ob- 
served degree sequence, on which disease spread scenar- 
ios can be tested. Another reason for studying classes or 
ensembles of graphs obeying constraints comes from the 
fact that the network structure of many large-scale real- 
world systems is not the result of a global design, but of 
complex dynamical processes with many stochastic ele- 
ments. Accordingly, a statistical mechanics approach [1] 
can be employed to characterize the collective proper- 
ties of the system emerging from its node level (micro- 
scopic) properties. In this approach, statistical ensembles 
of graphs are defined [f| 0], representing "connectivity 
microstates" from which macroscopic system level prop- 
erties are inferred via averaging. Here we focus on the 
degree as a node characteristic, which could represent, 



for example, the number of friends of a person, the va- 
lence of an atom in a chemical compound, the number of 
clients of a router, etc. 

In spite of its practical importance, finding a method 
to construct degree-based graphs in a way that allows 
the corresponding graph ensemble to be properly sam- 
pled has been a long-standing open problem in the net- 
work modeling community (references using various ap- 
proaches are given below). Here we present a solution 
to this problem, using a biased sampling approach. We 
consider degree-based graph ensembles on two levels: 1) 
sequence-level, where a specific sequence of degrees is 
given, and 2) distribution level, where the sequences are 
themselves drawn from a given degree distribution P (d). 
In the remainder we will focus on the fundamental case 
of labeled, undirected simple graphs. In a simple graph 
any link connects a single pair of distinct nodes and self 
loops and multiple links between the same pair of nodes 
are not allowed. Without loss of generality, consider a 
sequence of N positive integers T> — {do, d\,..., d^-i}, 
arranged in non-increasing order: do ^ d\ ■ • ■ ^ djv— i- 
If there is at least one simple graph G with degree se- 
quence T>, the sequence T> is called a graphical sequence 
and we say that G realizes T>. Note that not every 
sequence of positive integers can be realized by simple 
graphs. For example, there is no simple graph with de- 
gree sequence {3,2,1} or {5,4,3,2,1,1}, while the se- 
quence {3, 3, 2, 2, 2} can obviously be realized by a simple 
graph. In general, if a sequence is graphical, then there 
can be several graphs having the same degree sequence. 
Also note that given a graphical sequence, the careless 
or random placing of links between the nodes may not 
result in a simple graph. 

Recently, a direct, swap-free method to systematically 
construct all the simple graphs realizing a given graph- 
ical sequence T> was presented Q. However, in general 
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(for exceptions see Ref. @), the number of elements of the 
set Q (T>) of all graphs that realize sequence T>, increases 
very quickly with N: a simple upper bound is provided 
by the number of all graphs with sequence T>, allowing 
for multiple links and loops: \Q (T>)\ ^ Elilo 1 ^' ■ Thus, 
typically, systematically constructing all graphs with a 
given sequence T> is practical only for short sequences, 
such as when determining the structural isomers of alka- 
nes For larger sequences, and in particular for model- 
ing real- world complex networks, it becomes necessary to 
sample Q (T>) . Accordingly, several variants based on the 
Markov Chain Monte Carlo (MCMC) method were de- 
veloped. They use link-swaps [10| ("switches") to produce 
pseudo-random samples from Q (T>) . Unfortunately, most 
of them are based on heuristics, and apart from some 
special sequences, little has been rigorously shown about 
the methods' mixing time, and accordingly they are ill- 
controlled. The literature on such MCMC methods is 
simply too extensive to be reviewed here, instead, we re- 
fer the interested reader to Refs [Tl| - [l3l and the references 
therein. Finally, we recall the main swap-free method 
producing uniform random samples from Q (T>), namely 
the configuration model (CM) (l34l7j . This method 
picks a pair of nodes uniformly at random and connects 
them, until a rejection occurs due to a double link or a 
self-loop, in which case it restarts from the very begin- 
ning. For this reason, the CM can become very slow, as 
shown in the Discussion section. The CM has inspired ap- 
proximation methods as well [l8[ and methods that con- 
struct random graphs with given expected degrees [19j. 

Here, by developing new results from the theorems in 
Ref. H, we present an efficient algorithm that solves this 
fundamental graph sampling problem, and it is exact in 
the sense that it is not based on any heuristics. Given 
a graphical sequence, the algorithm always finishes with 
a simple graph realization in polynomial time, and it is 
rejection free. While the samples obtained are not uni- 
formly generated, the algorithm also provides the exact 
weight for each sample, which can then be used to pro- 
duce averages of arbitrary graph observables measured 
uniformly, or following any given distribution over Q (T>) . 

II. MATHEMATICAL FOUNDATIONS 

Before introducing the algorithm, we state some results 
that will be useful later on. We begin with the Erdos- 
Gallai (EG) theorem [20j], which is a fundamental result 
that allows us to determine whether a given sequence of 
non-negative integers, called "degree sequence" hereafter, 
is graphical. 

Theorem 1 (Erdos-Gallai). A non-increasing degree se- 
quence T> = {do, di, ■ ■ ■ , djsi-i} is graphical if and only if 
their sum is even and, for all ^5 k < N — 1 : 

k N-1 

s; fc(fc + i)+ min { fc + Mi}- (!) 

i=0 i=fe+l 



A necessary and sufficient condition for the graphical- 
ly of a degree sequence, which is constrained from having 
links between some node and a "forbidden set" of other 
nodes is given by the star-constrained graphically theo- 
rem [8|. In this case the forbidden links are all incident 
on one node and thus form a "star". To state the the- 
orem, we first define the "leftmost adjacency set" of a 
node i with degree di in a degree sequence T> as the set 
consisting of the di nodes with the largest degrees that 
are not in the forbidden set. If T> is non- increasing, then 
the nodes in the leftmost adjacency set are the first di 
nodes in the sequence that are not in the forbidden set. 
The forbidden set could represent nodes that are either 
already connected to i, and thus subsequent connections 
to them are forbidden, or just imposed arbitrarily. Using 
this definition, the theorem is: 

Theorem 2 (Star-constrained graphical sequences). Let 
= {do, di, . . . , d^-i} be a non-increasing graphical de- 
gree sequence. Assume there is a set of forbidden links 
incident on a node i. Then a simple graph avoiding the 
forbidden links can be constructed if and only if a simple 
graph can be constructed where i is connected to all the 
nodes in its leftmost adjacency set. 

A direct consequence Q of Theorem [5] for the case of 
an em pty forbidden set is the well-known Havel-Hakimi 
result |2ll . [22| , which in turn implies: 

Corollary 1. Let T> = {do, d\, . . . , djf-i} be a 
non-increasing unconstrained graphical degree sequence. 
Then, given any node i, there is a realization of T> that 
includes a link between the first node and i. 

Another result we exploit here is Lemma 3 of Ref. [H, 
extended to star-constrained sequences: 

Lemma 1. Let T> be a graphical sequence, possibly 
with a star constraint incident on node i. Let j and 
k be distinct nodes not in the forbidden set and dif- 
ferent from i, such that dj > dk- Then V = 
{do, ■ ■ ■ , dj — 1, . . . , dk + 1, • • • j dj^-i} is also a graphical 
sequence with the same star constraint. 

Proof. Let Xi denote the set of nodes forbidden to con- 
nect to node i. Since T> is star-constrained graphical 
there is a simple graph G realizing the sequence with no 
connections between i and Xi. Since dj > dk, there is a 
node m to which j is connected but k is not. Note that 
m could be in Xi U {i}. Now cut the edge (m, j) of G cre- 
ating a stub at m and another at j . Remove the stub at j 
so that its degree becomes dj — 1, and add a stub at k so 
that its degree becoming dk + 1. Since there are no con- 
nections in G between m and k, connect the two stubs at 
these nodes creating a simple graph G' thus realizing V . 
Clearly there are still no connections between i and Xi 
in G' , and thus V is also star-constrained graphical. □ 

Finally, using Lemma [1] and Theorem (5J we prove: 
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Theorem 3. Let T> be a degree sequence, •possibly with 
a star- constraint incident on node i, and let y and z be 
two nodes with degrees such that d y ^ d z that are not 
constrained from linking to node i . If the residual degree 
sequence T>' obtained from T> by reducing the degrees at i 
and y by unity is not graphical, then the degree sequence 
T>" obtained from T> by reducing the degrees at i and z by 
unity is also not graphical. 

Proof. By definition, d\ = d t for I e {0, . . . , N - 1} \ 
{i,y} and d[ = di — 1, d' y = d y — 1; d" = di for 
I e {0, ...,N- 1} \ {i,z} and d" = di - 1, d" z = 
d z — 1. We consider di > d y , however, the proof is 
not affected by this assumption. By assumption, V 
is not graphical. Using proof by contradiction, as- 
sume that V" = {. . . , di — 1, . . . , d y , . . . , d z — 1, . . .} is 
graphical. Clearly, d y > d z — 1, and thus we can ap- 
ply Lemma Q] on this sequence. As a result, the se- 
quence {. . . , di — 1, . . . , d y — 1, . . . , d z — 1 + 1, . . .}, that 
is exactly V is graphical, a contradiction. □ 

Note that if a sequence is non-graphical, then it is not 
star-constrained graphical either, and thus Theorem [3] is 
in its strongest form. 

III. BIASED SAMPLING 

The sampling algorithm described below is ergodic in 
the sense that every possible simple graph with the given 
finite degree sequence is generated with non-zero prob- 
ability. However, it does not generate the samples with 
uniform probability; the sampling is biased. Neverthe- 
less, the algorithm can be used to compute network ob- 
servables that are unbiased, by appropriately weighing 
the averages measured from the samples. According to 
a well known principle of biased sampling |23l . l'2-ll | , if the 
relative probability of generating a particular sample Sj 
is p Si , then an unbiased estimator for an observable Q 
measured from a set of M randomly generated samples 
si, S2, . . . , sm is the weighted average 

(Q) = , (2) 

where the weights are w Si = pj , and the denominator 
is a normalization factor. The key to this method is to 
find the appropriate weight w Si to associate with each 
sample. Note that in addition to uniform sampling, it is 
in fact possible to sample with any arbitrary distribution 
by choosing an appropriate set of sample weights. 

IV. THE ALGORITHM 

Let T> be a non-increasing graphical sequence. We wish 
to sample the set Q (D) of graphs that realize this se- 
quence. The graphs can be systematically constructed 
by forming all the links involving each node. To do so, 



begin by choosing the first node in the sequence as the 
"hub" node and then build the set of the "allowed nodes" 
A = {a\, a2, . . . , ak} that can be connected to it. A con- 
tains all the nodes that can be connected to the hub such 
that if a link is placed between the hub and a node from 
A, then a simple graph can still be constructed, thus 
preserving graphicality. Choose uniformly at random a 
node a € A, and place a link between a and the hub. If 
a still has "stubs", i.e. remaining links to be placed, then 
add it to the set of "forbidden nodes" X that contains 
all the nodes which can't be linked anymore to the hub 
node and which initially contains only the hub; other- 
wise, if a has no more stubs to connect, then remove it 
from further consideration. Repeat the construction of 
A and link the hub with one of its randomly chosen el- 
ements until the stubs of the hub are exhausted. Then 
remove the hub from further consideration, and repeat 
the whole procedure until all the links are made and the 
sample construction is complete. Each time the proce- 
dure is repeated, the degree sequence V considered is the 
"residual degree sequence", that is the original degree se- 
quence reduced by the links that have previously been 
made, and with any zero residual degree node removed 
from the sequence. Then, choose a new hub, empty the 
set of forbidden nodes X and add the new hub to it. It is 
convenient, but not necessary, to choose the new hub to 
be a node with maximum degree in the residual degree 
sequence. 

The sample weights needed to obtain unbiased esti- 
mates using Eq. [2] are the inverse relative probabilities 
of generating the particular samples. If in the course 
of the construction of the sample m different nodes 
i = 1, 2, . . . , m are chosen as the hub and they have di 
residual degrees when they are chosen, then this sample 
weight can be computed by first taking the product of 
the sizes ki . of the allowed sets A constructed, then di- 
viding this quantity by a combinatorial factor which is 
the product of the factorials of the residual degrees of 
each hub: 

m -. di 

The weight accounts for the fact that at each step the 
hub node has ki . nodes it can be linked to, which is the 
size of the allowed set at that point, and that the number 
of equivalent ways to connect the residual stubs of a new 
hub is di\. Note that it is always true that w ^ 1, with 
w = 1 occurring for sequences for which there is only one 
possible graph. 

A. Building the allowed set 

The most difficult step in the sampling algorithm is to 
construct the set of allowed nodes A. In order to do so 
first note that Theorem [3] implies that if a non- forbidden 
node, that is a node not in X, can be added to A, then 
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all non-forbidden nodes with equal or higher degree can 
also be added to A. Conversely, if it is determined that a 
non-forbidden node cannot be added to A, then all nodes 
with equal or smaller degree also cannot be added to A. 
Therefore, referring to the degrees of nodes that can- 
not be added to A as "fail-degrees", the key to efficiently 
construct A is to determine the maximum fail-degree, if 
fail-degrees exist. 

The first time A is constructed for a new hub, accord- 
ing to Corollary [TJ there is no fail-degree and A consists 
of all the other nodes. However, constructing A becomes 
more difficult once links have been placed from the hub 
to other nodes. In this case, to find the maximum fail- 
degree note that at any step during the construction of 
a sample the residual sequence being used is graphical. 
Then, since according to Theorem [5] any connection to 
the leftmost adjacency set of the hub preserves graphi- 
cally, it follows from Theorem [3] that any fail-degree has 
to be strictly less than the degree of any node in the 
leftmost adjacency set of the hub. 

If there are non-forbidden nodes in the residual degree 
sequence that have degree less than any in its leftmost ad- 
jacency set, then the maximum fail-degree can be found 
with a procedure that exploits Theorem [5] In particu- 
lar, if the hub is connected to a node with a fail-degree, 
then, by Theorem [21 even if all the remaining links from 
the hub were connected to the remaining nodes in the 
leftmost adjacency set, the residual sequence will not be 
graphical. Our method to find fail-degrees, given below, 
is based on this argument. 

Begin by constructing a new residual sequence T>' by 
temporarily assuming that links exist between the hub 
and all the nodes in its leftmost adjacency set except 
for the last one, which has the lowest degree in the set. 
The nodes temporarily linked to the hub should also be 
temporarily added to the set of forbidden nodes X. The 
nodes in V should be ordered so that it is non-increasing, 
that forbidden nodes appear before non-forbidden nodes 
of the same degree, and that the hub, which now has 
residual degree 1, is last. 

At this point, in principle one could find the maximum 
fail degree by systematically connecting the last link of 
the hub with non-forbidden nodes of decreasing degree, 
and testing each time for graphically using Theorem [T] 
If it is not graphical then the degree of the last node con- 
nected to the hub is a fail-degree, and the node with the 
largest degree for which this is true will have the max- 
imum fail-degree. However, this procedure is inefficient 
because each time a new node is linked with the hub the 
residual sequence changes and every new sequence must 
be tested for graphicality. 

A more efficient procedure to find the maximum fail- 
degree instead involves only testing the sequence V . To 
see how this can be done, note that V is a graphical 
sequence, by Theorem [2] Thus, by Theorem [1] for all 
relevant values of fc, the left hand side of Inequality [U 
Lk, and the right hand side of it, Rk, satisfy ^ Rk- 
Furthermore, for the purposes of finding fail-degrees it is 



sufficient to consider linking the final stub of the hub with 
only the last non-forbidden node of a given degree, if any 
exists. After any such link is made, the resulting degree- 
sequence T>" will be non-increasing, and thus Theorem 
1 can be applied to test it for graphicality. Therefore, if 
the degree of the node connected with the last stub of 
the hub is a fail-degree, then Inequality Q] for T>" must 
fail for some fc. For each fc, the possible differences in Lk 
and Rk between V and T>" are as follows. Rk is always 
reduced by 1 because the residual degree of the hub is 
reduced from 1 to 0. Rk may be reduced by an another 
factor of 1 if the last node connected to the hub, having 
index x and degree d x , is such that x > k and d x < fc + 2. 
Lk is reduced by 1 if x ^ k, otherwise it is unchanged. 

Considering these conditions that can cause Inequal- 
ity [1] to fail for T>" , the set of allowed nodes A can be 
constructed with the following algorithm that requires 
only testing T>' . Starting with k — 0, compute the values 
of Lk and Rk for V . There are three possible cases: (I) 
L k = Rk, (2) Lk = Rk-1, and (3) L k R k - 2. In 
case (1) fail-degrees occur whenever L k is unchanged by 
making the final link to the hub. Thus, the degree of the 
first non-forbidden node whose index is greater than k 
is the largest fail-degree found with this value of A:. In 
case (2) fail-degrees occur whenever Lk is unchanged and 
Rk is reduced by 2 by making the final link to the hub. 
Thus, the degree of the first non-forbidden node whose 
index is greater than k and whose degree is less than fc + 2 
is the largest fail-degree found with this value of fc. In 
case (3) no fail-degree can be found with this value of 
fc. Repeat this process sequentially increasing fc, until all 
the relevant fc values have been considered, then retain 
the maximum fail-degree. It can be shown that the al- 
gorithm can be stopped either after a case (1) occurs, or 
after k = r where r is the lowest index of any node in 
V with degree d' r < r. Once the maximum fail-degree is 
found, remove the nodes that were temporarily added to 
X and construct A by including all non-forbidden nodes 
of V with a higher degree. If no fail-degree is ever found, 
then all non-forbidden nodes of T> are included in A. A 
will always include the leftmost adjacency set of the hub 
and any non-forbidden nodes of equal degree. 

Note that after a link is placed in the sample construc- 
tion process, the residual degree sequence T> changes, and 
therefore, A has to be determined every time. 



B. Implementing the Erdos-Gallai test 

Finally, Lk and Rk should be calculated efficiently. 
Calculating the sums that comprise them for each new 
value of fc can be computationally intensive, especially for 
long sequences. Even computing them only for as many 
distinct terms as there are in the sequence, as suggested 
in Ref. [25], can still become slow if the degree distribu- 
tion is not quickly decreasing. Instead, it is much more 
efficient to use recurrence relations to calculate them. 
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Figure 1: Probability distribution p of the logarithm of 
weights for an ensemble of power-law sequences with N = 100 
and 7 = 3. The ensemble contained 2 x 10 4 graphical se- 
quences, and for each sequence 10 6 graph samples were pro- 
duced. Thus, the total number of samples produced was 
2 x 10 10 . The simulation data is given by the solid black 
line and a Gaussian fit of the data is shown by the dashed red 
line that nearly obscures the black line. 



A recurrence relation for Lk is simply 

Lk = Lk-i + d k , 



(4) 



with Lq = do. 

For non-increasing degree sequences, define the 
"crossing-index" x k for each k as the index of first node 
that has degree less than fc+1, that is for which di < k+1 
for all i x k . If no such index exists, such as for k = 
since the minimum degree of any node in the sequence is 
1, then set Xk = N. Then, a recurrence relation for Rk 
is 

R k = Rk-i+x k -l-(x k -l-2k+d k ) Q(k+l-x k ) (5) 

where is a discrete equivalent of the Heaviside function, 
defined to be 1 on positive integers and otherwise, and 
Ro = N—l. Or, since the crossing-index can not increase 
with k, that is x k x k -\ for all fc, a value k* will exist 
for which x k < k + 1 for all k ^ k* , and so Eq. [5] can be 
written 



Rk 



Rk- 
Rk- 



x k -l 
2k - d k 



for k < k* 
for k>k* 



(G) 



Thus, there is no need to find Xk for k > k*. 

Using Eqs. 2] and El the mechanism of the calculation 
of L k and R k at sequential values of k is shifted from a 
slow repeated calculation of sums of many terms to the 
much less computationally intensive task of calculating 
the recurrence relations. In order to perform the test 
efficiently, a table of the values of crossing-index Xk for 
each relevant k can be created as V is constructed. 

It should be noted that the usefulness of this method 
for calculating Lk and R k is broader than its use for cal- 
culating fail-degrees in our sampling algorithm. In par- 
ticular, it can be used in an Erdos-Gallai test to efficiently 
determine whether a degree-sequence is graphical. 



Figure 2: Mean m and standard deviation a of the distri- 
butions of the logarithm of the weights vs. number of nodes 
N of samples from an ensemble of power-law sequences with 
7 = 3. The black circles correspond to m, the red squares 
correspond to a. The error bars are smaller than the sym- 
bols. The solid black line and the dashed red line show the 
outcomes of fits on the data. The linearity of the data on a 
logarithmic scale indicates that the m and a follow power-law 
scaling relations with N: m ~ N a and a ~ . The slopes 
of the fit lines are an estimate of the value of the exponents: 
a = f .22042 ± 0.00007 and /3 = 0.8599 ± O.OOf 8. 



V. SAMPLE WEIGHTS 

As previously stated, the weight w associated with a 
particular sample, given by Eq. [3l is the product of the 
sizes h of all the sets of allowed nodes that have been 
built for each hub node i divided by the product of the 
factorials of the initial residual degrees of each hub node. 
The logarithm of this weight is 



logw = ^2 



i=l 



] log h 



log (dil) 



(7) 



Generally, degree sequences with N ^> 1 admit many 
graphical realizations. When this is true, each of the m 
terms in square brackets in Eq. [7] are effectively random 
and independent, and, by virtue of the central limit the- 
orem, their sum will be normally distributed. That is, 
the weight w of graph samples generated from a given 
degree sequence with large N is typically log-normally 
distributed. However, degree sequences with N ^> 1 that 
have only a small number of realizations do exist, and w 
is not expected to be log-normally distributed for those 
sequences. 

Furthermore, one can consider not just samples of a 
particular graphical sequence, but of an ensemble of se- 
quences. By a similar argument to that given above for 
individual sequences, the weight w of graph samples gen- 
erated from an ensemble of sequences will also typically 
be log-normally distributed in the limit of large N. For 
example, consider an ensemble of sequences of randomly 
chosen power-law distributed degrees, that is, sequences 
of random integers chosen from a probability distribu- 
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tion P{d) ~ dr 1 . Hereafter, we refer to such sequences 
as "power-law sequences." Figure [T] shows the probabil- 
ity distribution of the logarithm of weights for realiza- 
tions of power-law sequences with exponent 7 = 3 and 
N = 100. Note that this distribution is well approxi- 
mated by a Gaussian fit. 

We have also studied the behavior of the mean and 
the standard deviation of the probability distribution of 
the logarithm of the weights of such power-law sequences 
as a function of TV. As shown in Fig. [3J they scale as a 
power-law. We have found qualitatively similar results, 
including power-law scaling of the growth of the mean 
and variance of the distribution of logw, for binomially 
distributed degree sequences that correspond to those of 
Erdos-Renyi random graphs with node connection prob- 
ability p such that pN = 4, and for uniformly distributed 
degree sequences, that is power-law sequences with 7 = 0, 
with an upper limit, or cutoff, of y/~N for the degree of 
a node. However, for uniformly distributed degree se- 
quences without an imposed upper limit on node degrees, 
we find that the sample weights are not log-normally dis- 
tributed. 
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Figure 3: The estimated computational complexity of the al- 
gorithm for power-law sequences. The leading order of the 
computational complexity of the algorithm as a power of N, 
where N is the number of nodes, is plotted as a function of the 
degree distribution power-law exponent 7. The black circles 
correspond to ensembles of sequences without cutoff, while 
the red squares correspond to ensembles of sequences with 
structural cutoff in the maximum degree of d m ax = yfN ■ The 
fits that yielded the data points were carried out considering 
sequences ranging in size from N = 100 to N — 10 8 . 



VI. COMPLEXITY 

In this section we discuss the algorithm's computa- 
tional complexity. We first provide an upper bound on 
the worst case complexity, given a degree sequence T>. 
Then, using extreme value arguments, we conservatively 
estimate the average case complexity for degree sequences 
of random integers chosen from a distribution P(d). The 
latter is useful for realistically estimating the computa- 
tional costs for sampling graphs from ensembles of long 
sequences. 

To determine an upper bound on the worst case com- 
plexity for constructing a sample from a given degree 
sequence T>, recall that the algorithm connects all the 
stubs of the current hub node before it moves on to the 
hub node of the new residual sequence. For every stub 
from the hub one must construct the allowed set A. The 
algorithm for constructing A, which includes construct- 
ing T>' , performing the Lk vs Rk comparisons, and de- 
termining the maximum fail-degree, can be completed in 
0[N — j] steps, where N — j is the maximum possible 
number of nodes in the residual sequence after eliminat- 
ing j hubs from the process. Therefore, an upper bound 
on the worst case complexity C w of the algorithm given 
a sequence T> is: 

c„<o(x;(jv-i)d i )<o^x;d i j (8) 

where the sum involves at most O(N) terms. Equiva- 
lently, G w ^ O(NNi), with iVj being the number of links 
in the graph. For simple graphs, the maximum possi- 
ble number of links is 0(N 2 ), and the minimum possible 
number is Q(N). If iVj = 0(N), then C w < Q(N 2 ), and 



if Ni = 0{N 2 ), then C w sC 0(N 3 ), which is an upper 
bound, independent of the sequence. 

From Eq. [8j the expected complexity for the algorithm 
to construct a sample for a degree sequence of random 
integers chosen from a distribution P(d), normalized to 
unity, can be conservatively estimated as 

C ~ O r£(N - j)d-j . (9) 

Here dj is the expectation value for the degree of the 
node with index j , which is the largest degree for which 
the expected number of nodes with equal or larger degree 
is at least j + 1. That is, 

dj = max j d* : N jjT P(d) ^ j + 1 j . (10) 

I d=d' ) 

Notice that the sum in the above equation runs to the 
maximum allowed degree in the network d max , which is 
nominally N — 1, but a different value can be imposed. 
For example, in the case of power-law sequences, the so- 
called structural cutoff of d max ^ yfN is necessary if de- 
gree correlations are to be avoided [l^. l26l. l27l| . However, 
such a cutoff needs to be imposed only for 7 < 3, be- 
cause the expected maximum degree do in a power-law 
1 

network grows like N f- 1 . Thus, for 7 ^ 3, do grows no 
faster than y/N and no degree correlations exist for large 

n 

Given a particular form of distribution P(d), Eq. [9] 
can be computed for different values of N. Subsequent 
fits of the results to a power-law function allow the or- 
der of the complexity of the algorithm to be estimated. 



7 



Figure [5] shows the results of such calculations for power- 
law sequences with and without the structural cutoff of 
d max = y/N as a function of exponent 7. Note that, in 
the absence of cutoff, the results indicate that the order 
of the complexity goes to a value of 3 for 7 — > 0, that is, 
in the limit of a uniform degree distribution. However, 
if the structural cutoff is imposed the order of the com- 
plexity is only 2.5 in this limit. Both these results are 
easily verified analytically. 

We have tested the estimates shown in Fig. [3] with our 
implementation of the sampling algorithm for power-law 
sequences with and without the structural cutoff for cer- 
tain values of 7, including 0, 2, and 3. This was done by 
measuring the actual execution times for generating sam- 
ples for different N and fitting the results to a power-law 
function. In every case, the actual order of the complex- 
ity of our implementation of the sampling algorithm was 
equal to or slightly less than its estimated value shown 
in Fig. [3 

VII. DISCUSSION 

We have solved the long standing problem of how to ef- 
ficiently and accurately sample the possible graphs of any 
graphical degree sequence, and of any ensemble of degree 
sequences. The algorithm we present for this purpose 
is ergodic and is guaranteed to produce an independent 
sample in, at most, 0(N 3 ) steps. Although the algorithm 
generates samples non-uniformly, and, thus, it is biased, 
the relative probability of generating each sample can be 
calculated explicitly permitting unbiased measurements 
to be made. Furthermore, because the sample weights 
are known explicitly, the algorithm makes it possible to 
sample with any arbitrary distribution by appropriate 
re- weighting. 

It is important to note that the sampling algorithm 
is guaranteed to successfully and systematically proceed 
in constructing a graph. This behavior contrasts with 
that of other algorithms, such as the configuration model 
(CM), which can run into dead ends that require back- 
tracking or restarting, leading to considerable losses of 
time and potentially introducing an uncontrollable bias 
into the results. While there are classes of sequences 
for which it is perhaps preferable to use the CM instead 
of our algorithm, in other cases its performance rela- 
tive to ours can be remarkably poor. For example, a 
configuration model code failed to produce even a sin- 
gle sample of a uniformly distributed graphical sequence, 
P(d) = const., with N = 100, after running for more 
than 24 hours, while our algorithm produced 10 4 samples 
of the very same sequence in 30 seconds. Furthermore, 



each sample generated by our algorithm is independent. 
This behavior contrasts with that of algorithms based 
on MCMC methods. Because our algorithm works for 
any graphical sequence and for any ensemble of random 
sequences, it allows arbitrary classes of graphs to be stud- 
ied. 

One of the features of our algorithm that makes it ef- 
ficient is a method of calculating the left and right sides 
of the inequality in the Erdos-Gallai theorem using re- 
cursion relations. Testing a sequence for graphicality can 
thus be accomplished without requiring repeated com- 
putations of long sums, and the method is efficient even 
when the sequence is nearly non-degenerate. The use- 
fulness of this method is not limited to the algorithm 
presented for graph sampling, but can be used anytime 
a fast test of the graphicality of a sequence of integers is 
needed. 

There are now over 6000 publications focusing on com- 
plex networks. In many of these publications various 
processes, such as network growth, flow on networks, 
epidemics, etc., are studied on toy network models used 
as "graph representatives" simply because they have be- 
come customary to study processes on. These include the 
Erdos-Renyi random graph model, the Barabasi-Albert 
preferential attachment model, the Watts-Strogatz small- 
world network model, random geometric graphs, etc. 
However, these toy models are based on specific pro- 
cesses that constrain their structure beyond their degree- 
distribution, which in turn might not actually correspond 
to the processes that have led to the structure of the net- 
works investigated with them, thus potentially introduc- 
ing dangerous biases in the conclusions of these studies. 
The algorithm presented here provides a way to study 
classes of simple graphs constrained solely by their de- 
gree sequence, and nothing else. However, additional 
constraints, such as connectedness, or any functional of 
the adjacency matrix of the graph being constructed, can 
in principle be added to the algorithm to further restrict 
the graph class built. 
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